Skip to content

Taking more time to analyse with many processes #29

@KarthickRaja2002

Description

@KarthickRaja2002

Hi @RootLUG ,

I am invoking Aura through Java ProcessBuilder as 30 processes with same zips as input. While doing this it is taking more time for analysis. If the same zip is invoked with a single process, it is completed within 3 mins. But doing the same for 30 zips as 30 processes, it is taking more than an hour.

Moreover, The zip contains more recursive zips. So that I have used the ThreadPoolExecutors with max_workers as 10 for extraction alone. I have also changed the max-depth in aura_config.yaml file to 50.

Here, I have given the modified ThreadPoolExecutor in package_analyzer.py file. Kindly check this and let me know why it is taking too much time for analysis while invoking through Java with 30 processes.

Thanks in advance!

`
@staticmethod
def scan_directory(item: base.ScanLocation):
print(f"Collecting files in a directory '{item.str_location}")
dir_executor = futures.ThreadPoolExecutor(max_workers=10)
dir_executor.submit(Analyzer.scan_dir_by_ThreadPool, item)
collected = Analyzer.scan_dir_by_ThreadPool(item=item)
dir_executor.shutdown()
return collected

@staticmethod
def scan_dir_by_ThreadPool(item: base.ScanLocation):
    """Scanning input directory"""
    topo = TopologySort()
    collected = []
    for f in utils.walk(item.location):
        if str(f).endswith((".py",".zip",".jar",".war", ".whl", ".egg",".gz",".tgz")):
            new_item = item.create_child(f,
                parent=item.parent,
                strip_path=item.strip_path
                )
            collected.append(new_item)
            topo.add_node(Path(new_item.location).absolute())
            logger.debug("Computing import graph")
            for x in collected:
                if not x.metadata.get('py_imports'):
                    continue
                node = Path(x.location).absolute()
                topo.add_edge(node, x.metadata['py_imports']['dependencies'])
            topology = topo.sort()
            collected.sort(
                key=lambda x: topology.index(x.location) if x.location in topology else 0
            )
            logger.debug("Topology sorting finished")
    return collected

`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions